UNITED STATES PATENT APPLICATION 



£3 

y 
m 
m 
ya 

s 

¥■ 

m 

H 
Co 
G 

a 



of 

Ronald M. Kaplan, 
David L. Hecht 
Glen W. Petrie 
and 

Colin Luckman 
for 



A SYSTEM AND METHOD FOR DISTRIBUTING 
MULTILINGUAL DOCUMENTS 



LAW OFFICES 

Finn eg an, Henderson, 
Farabow, Garrett, 
& Dunner,l.l.p. 

1300 I STREET, N. W. 
WASHINGTON, DC 20 005 
202^408-4000 



' Attorney Docket No.: 07447-0013-00000 
Xerox Reference No. D/98591 

ii Related Applications 

; This application is related to U.S. Patent Application Serial Nos. 09/AAA,AAA, entitled 

;i "Assist Channel Coding With Character Classifications" (Attorney Docket No. D/AO038), 

|| 09/BBB,BBB, entitled "Assist Channel Coding With Vertical Block Error Correction" (Attorney 

i i 

|i Docket No. D/AO039), 09/CCC,CCC, entitled "Assist Channel Coding With Convolution 

! . Coding" (Attorney Docket No. D/AO040), and 09/DDD,DDD, entitled "Assist Channel Coding 

! j Using A Rewrite Model" (Attorney Docket No. D/AO041), which are hereby incorporated by 



;j reference. 
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10 : j The present invention relates generally to document image encoding and decoding, and 

j ; 
i * 

i ! 

|! more specifically to a method and apparatus for translating a document written in a first language 
! into a second language using a code embedded in the document. 

| Description of the Prior Art 

j 

!■ Large multinational companies often engage in official and cross-organizational 

15 | communication using a single working language. More often than not, the language of choice is 

;j English. While this may be a convenient and natural choice for English-speaking people, 

!; workers of other native languages would most likely prefer to receive communications in their 

'! own language and may have better comprehension when they receive information this way. 

:| 

!j There are about 3,000 known languages in the world (the number varies according to what is 
20 j 1 counted as a language; dialects that are clearly just that are not included! in this number), and 
each is the vehicle of a culture that is different in at least some ways from any other culture. 
Everywhere, when speakers of different languages have come in contact, somebody had to learn 
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a foreign language. There have always been individuals who found it interesting or profitable to 
do this. The earliest of explorers and traders were forced by necessity to learn to understand one 
another's language or to perish in the economic as well as the physical w r orlds. This, as we all 
know, resulted in extensive and long language studies with the erudite academicians handling the 
complex aspects of the communications exchange, while the more pragmatic day-to-day traders 
and businessmen developed short terse means of communication. The advent of the personal 
computers and the microprocessors has brought a flood of modern day approaches to this age old 
problem. The devices have ranged from direct word for word translation devices to key word 
translation directly into phrases. For example, U.S. Patent No. 4,412,305 relates to a translation 
device wherein a single word is used as the input to produce the translation of entire groups of 
words, such as sentences or phrases; a single word entered will access particular sentences within 
limited subject categories; letters within words or groups of words produce an equivalency 
detectable by a comparison circuit resulting in the representation in a second language of a 
plurality of words regardless of whether it is a noninflected word or an inflected word; and 
phrases can be tied to computer specified aural or visual control messages for use by an operator 
who chooses to use a particular language in the operation of a machine tool. 

U.S. Patent No. 5,490,061 similarly discloses a method of translating between two 
languages that utilizes an intermediate language bridge, whereby any one of a plurality of source 
languages is compatibly translated into the intermediate language, and then into any one of a 
plurality of target languages. There are several such intermediate languages, the most common 
of which is Esperanto created in the 1880's by Dr. Ludovic Lazarus Zamenhof (1859-1917) of 
Poland. It contains a compressed vocabulary (roughly one-tenth the number of words as 
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I 1 
Ii English) and a completely simplified and regular grammar. This eliminates the need for many 

8 : 
■i complex mathematical statements to account for the grammatical differences between existing 

! national languages. 

!i It is clear from a study of these and other related prior art references that a direct 

5 translation from one language to another includes a multiplicity of roadblocks, either in the lack 

I I of an available direct translation or in the relatively large dictionary of words that must be used 
j; to effect the translation. Given the interest in obtaining translations using relatively small 

conversion routines and the wide variety of usage rules in and among different languages, there 

jj 

II exists the desirability to provide an apparatus and means for easily obtaining an accurate 

j: 

10 ' j translation of a document such that regardless of the source or destination languages, the 

i i 

;| translation of the document will always be linguistically accurate. 

1: 

; J 

I; When a document is created in a first language, the ideal solution from the end user's 

1 1 standpoint would be to receive the document from the creator in any language of the user's 

ji 

ii choosing, regardless of the first language. From the sendees standpoint, the optimum solution is 

15 s ; to send a single translation of the document to each user and to provide them with the capability 

ji 

ii to accurately convert it to any language of the user's choosing. Since the former solution would 

j! 

|! likely result in an administrative nightmare, the present invention seeks to develop a solution to 

i | 

\\ 

| ! the problem more closely related to the latter solution. In operation, the user would preferably 
i i receive a version of the document written in a first human-readable language. At the user's 
20 ;! request, the document could be translated into a second language. The mechanism for translating 
the document from a first language to a second language could reside in the document or it could 
F ^unner A l R l E p T; !l be generated entirely from a source external to the document. It is imagined that in order to 
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create a highly accurate translation, it would be best to embed codes in the document that would 
assist in the translation of the document from the first language to at least a second language. 

The more efficiently the embedded code can be compressed, the more foreign languages 
can be encoded on the face of a document. At one extreme, this problem can be solved simply 
through the use of a standard decompression routine. In other words, take the translation, 
compress it according to a given scheme and store the compressed byte sequences in glyphs 
using machine-readable marks, such as glyph marks used in Xerox DATAGLYPH codes. These 
logically-ordered, single-bit digital quanta may be encoded by respective elongated slash-like 
glyphs tilted to the left and right of vertical by approximately +45° and -45° for encoding logical 
"0s" and "Is," respectively. The mutual orthogonality of the glyphs for the two logical states of 
0 and 1 of these single bit digital quanta enhances the discriminability of the code. Thus, the 
code pattern embedded in the glyphs can be recovered from an image of the glyphs, even when 
the code pattern is written on a fine grain pattern to cause the code pattern to have a generally 
uniform grayscale appearance. The machine-readable marks can be captured in an image, and 
the image can be analyzed to determine codes embedded in the image. Another advantage of 
glyph marks is that they may have an unobtrusive visual appearance. If the glyph marks are 
sufficiently small, the glyphs appear as grayscale to the unaided eye. For example, a text of 
about 3000 characters (a page) could be represented in a glyph pattern of 3 or 4 square inches. 
The device that produces the translation, simply needs to access the appropriate decompression 
algorithm. Unfortunately, the requirement that the translation information for each page be 
contained on the same page as the human-readable text rules out the use of standard adaptive 
compressors like an LZW (Lempel-Ziv- Welch) code since each page would have to be coded 
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separately, thereby eliminating the advantage usually associated with this and other similar 

compression schemes. 

A dictionary-based compaction scheme similar to the system disclosed in commonly 
assigned U.S. Patent No. 5,787,386 to Kaplan et al., is an alternative method for encoding the 
translation data. Here, a computerized multilingual translation dictionary includes a set of words ; 
and phrases for each of the languages it contains, plus a mapping that indicates the translations in j 
other languages that correspond to each of the words and phrases. 

At the other extreme, a machine translation (MT) solution could be implemented. In 
other words, the human-readable text is interpreted by an optical character reader (OCR), run 
through the MT and the results are outputted to the user. Unfortunately, a fully accurate MT 
capability is not yet available. Therefore, any translation outputted from an MT algorithm would i 
not likely provide an easily understandable translation in the user's desired language. 

Thus, a goal of this invention is to provide a document written in a first human-readable 
language and including multiple portions of machine-readable codes that allows the user to 
accurately convert the document from the first language to a plurality of other natural languages. 

Another goal of this invention is to therefore provide a human-readable document with a 
plurality of embedded multilingual translations stored thereon such that one of a plurality of 
selected translations can be decoded and converted into human-readable form, using information 
present on the face of a document. 
! Summary 

In accordance with the purpose of the present invention, as embodied and broadly 
j described, the invention provides a method and apparatus for generating and distributing 
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multilingual documents. The multilingual documents are comprised of primary information 

consisting of human-readable text and secondary information consisting of machine-readable 

data such that a translation of the text is accomplished by converting the human-readable text 

into a second language through the use of the decoded machine-readable data. In a preferred 

embodiment, the machine-readable data is embedded in an image on the document using glyphs. 

A conversion code in accordance with the present invention further reduces the number of bytes 

it takes to code a translation so that multiple translations can be placed on the face of a document 

such that it is artistically hidden on exactly the same page surface that contains the corresponding 

source language text. Thus, each page can be translated by itself, even when other pages of the 

document are unavailable. 

Additional objects and advantages of the invention will be set forth in part in the 

description which follows, and in part will be clear from the description, or may be learned by 

practice of the invention. The objects and advantages of the invention will be realized and 

attained by means of the elements and combinations particularly pointed out in the appended 

claims. It is to be understood that both the foregoing general description and the following 

detailed description are exemplary and explanatory only and are not restrictive of the invention, 

as claimed. 

Brief Description of the Drawings 

The accompanying drawings, which are incorporated in and constitute a part of this 
specification, illustrate an embodiment of the invention and, together with the description, serve 
to explain the principles of the invention. 
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Fig. 1 illustrates the conversion of an electronic document into a multilingual document 
in accordance with the present invention; 

Fig. 2 is a flow diagram of the general steps for creating a multilingual document in 
accordance with the present invention; 

Fig. 3 illustrates the conversion of a hardcopy multilingual document into an electronic 
document written in a second human-readable language; 

Fig. 4 illustrates a self-clocking glyph code pattern and a portion of its binary 
interpretation; 

Fig. 5 illustrates a user interface that may be used for selecting at least a second human- 
readable language to extract from a multilingual document in accordance with the present 
invention; and 

Fig. 6 is a flow diagram of the general steps for converting a hardcopy multilingual 
document into a translated version of the hardcopy document in a second human-readable 
language. 

DETAILED DESCRIPTION OF THE INVENTION 

Reference will now be made in detail to embodiments of the invention, examples of 
which are illustrated in the accompanying drawings. Apparatus and methods disclosed herein 
consistent with the principles of the invention provide a human-readable document with a 
plurality of multilingual translations stored thereon. The multilingual document in accordance 
with a preferred embodiment of the present invention is comprised of a human-readable portion 
and a machine-readable portion. The human-readable portion is a translation of the multilingual 
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document in a first language and the machine-readable portion is an embedded code for 

converting the human-readable portion into at least a second human-readable language. 

FIGs. 1 and 3 illustrate the general operating environment of the present invention, in 

which documents are exchanged between the electronic and the hardcopy domain. FIG. 1 

illustrates an electronic document 102 that is viewed and/or created on a display 104, or the like, 

and a hardcopy document 106 that is rendered on a physical medium such as paper by a printer 

108. While this specification describes the process as if it begins with an electronic document 

being displayed on display 104, it is important to understand that the process may actually begins 

when coded text (ASCII) is received but not displayed by system 100. For example, text could 

be received as a result of a database query, OCR-inputted text, user-inputted, etc., the data 

processed and output created all prior to displaying the inputted data. A multilingual encoding 

module 110 shown in FIG. 1 receives image or coded data from an electronic document 

processing system (not shown) that is used for creating and/or editing the electronic document 

102 and produces as output augmented image data 122. 

As shown in FIG. 1, multilingual encoding module 1 10 is comprised of an 

encoding/compression module 1 16 and a merge module 120. The image or coded data input to 

the multilingual encoding module 1 10 is defined herein as primary channel 112 data that 

includes the image or coded data 1 14 of the inputted text. Encoding/compression module 1 16 of 

the multilingual encoding module 110 produces/generates several different foreign language 

translations in machine-readable code. These foreign language translations are defined herein as 

secondary or multilingual channel data. This data is output to merge module 120 along 

multilingual channel 118. Merge module 120 of the multilingual encoding module 110 merges 
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ii 

I I the primary channel data 112 with the multilingual channel data 1 18 to produce the augmented 

I' 

jj image data 122. In this embodiment shown in FIG. 1, the primary channel 112 and the 

! multilingual channel 1 18 of the augmented image data 122 is rendered on the hardcopy 

i document 106 at 124 and 126, respectively. The primary channel data 124 is human readable 

ij information, while the multilingual channel data 126 is optimized to be machine readable 

jl 

I ! ( 

|: information. 

FIG. 2 illustrates a flow diagram of the general steps for creating a multilingual document 
Ij in accordance with the present invention. As shown in FIG. 2, the process begins in step 210 

! ! 
i; 

j! when a user inputs text into terminal 100. The methods for inputting text into terminal 100 are 

| well-known and numerous, including direct entry via keyboard (not shown), download from a 

j memory location or network, scanner, etc. After the data is inputted, processing flows to step 

j! 220 where the inputted text is transmitted to multilingual encoding module 110. Multilingual 

i | encoding module 110 creates augmented image data 122 in step 230 and a hardcopy output of 

ij 

jj the multilingual document is created in step 240. It is imagined that encoding/compression 
! module 1 16 of multilingual encoding module 110 may have a predetermined collection of 
j! languages that it can create encoding machine-readable codes. It may also be programmable 
\ such that the languages supported can be changed as often as the user would like, 
ij FIG. 3 illustrates the uploading of the hardcopy document 106, with data from a primary 

ii channel 124 and a multilingual encoding 126 rendered thereon, from the hardcopy domain to the 

i ' 

'! electronic domain. As shown in FIG. 3, multilingual decoding module 304 is comprised of an 
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system using scanner 308. A multilingual decoding module 304 then receives bitmap image or 
coded data 306 from scanner 308. An image extraction module 3 10 in the multilingual decoding 
module 304 separates the multilingual channel 118 data from bitmap primary channel 314 data. 
After a user selects the desired translation for output, a decoding/decompression module 312 in 
the multilingual decoding module 304 decodes and decompresses the appropriate portion of the 
multilingual channel 118 and passes the data along decoded and decompressed multilingual 
channel 319 to decoder module 316. The decoder module 316 performs OCR on the bitmap 
primary channel 314 and passes the primary channel data 1 12 to terminal 100. In a preferred 
embodiment, multilingual channel data 118 and 126 is comprised of data to assist the conversion 
of primary channel data 112 and 124, respectively, from a first human-readable language into at 
least a second human-readable language. It is also envisioned that multilingual channel data 1 1 8 
and 126 may additionally include information helpful in performing OCR of the primary channel 
data 1 12 and 124, respectively. A method for assisting in performing OCR of primary channel 
data through the use of an assist channel is described in commonly assigned, co-pending U.S. 
Patent Application entitled, Assist Channel Coding with Character Classifications (Serial No. 
09/AAA 5 AAA), the contents of which are expressly incorporated by reference. Once accurately 
reconstructed using the decoder module 316, the primary channel data 112 can be displayed on 
display 104 as image data 114. 

Accordingly, the multilingual encoding module 110 aids in converting electronic image 
or coded data stored in a first language into a hardcopy document 106 that includes human- 
readable data written in a first language and machine-readable foreign language translation data 
written in machine-readable codes. Multilingual decoding module 304 aids in converting an 
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! 



■j electrical image of human-readable data written in a first language and machine-readable data, 

j: 

! into an electrical image of the hardcopy document 114 written in at least a second human- 

| readable language. 

i 

! i While FIGs. 1 and 3 show the placement of the multilingual channel data 126 at the 

il 

Ij bottom of the page of a hardcopy document 106, it is understood that the data could be placed 

ii 

!; anywhere on the face of the document without departing from the spirit and scope of this 
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It will be appreciated by those skilled in the art that there exits multiple operating 
arrangements of the multilingual encoding/decoding modules 110 and 304 shown in FIGs. 1 and 
3. In some embodiments, the multilingual encoding/decoding modules 1 10 and 304 are 
embedded in computer systems that operate integral with terminal 100 or the printer 108, or that 
operate separate from terminal 100 and printer 108. In other embodiments, the multilingual 
encoding/decoding modules 110 and 304 operate integral with each other or separate from each 
other on one or more computer systems. 

Multilingual encoding seeks to provide a compressed foreign language translation of the 
primary information in machine readable form, rendered on the hardcopy document. In an 
alternate embodiment, the encoded multilingual document information could be stored in a 
memory location (not rendered on a hardcopy medium) thereby providing the capability to store 
multiple translations in a minimal amount of additional storage space per language. 

In a preferred embodiment, the encoded multilingual information appears on the face of 
the hardcopy document as a compact, visually benign representation of the primary information. 
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shown in FIG. 4, glyph marks are composed of elongated slash-like marks or "glyphs" 422 and 

423 that are written on a generally regular rectangular lattice of centers on a suitable recording 

medium 424. Suitably, the glyphs 422 and 423 are printed by a printer (not shown) operating at 

300 d.p.i. to 600 d.p.i. to write 4 pixel x 4 pixel to 7 pixel x 7 pixel representations of the glyphs 

422 and 423 on regularly spaced centers that are distributed widthwise sind lengthwise of the 

recording medium 424 to produce the code pattern 421. The glyphs of these fine grain glyph 

code patterns are not easily resolved by the unaided human eye when the code patterns are 

viewed under standard lighting conditions and at normal reading distances, so the code pattern 

421 typically has a generally uniform gray scale appearance. Alternatively, the glyph marks may 

be modulated in an area to form a glyph half tone image or glyphtone as disclosed in commonly 

assigned U.S. Patent Nos. 5,315,098 and 5,706,099 the contents of which are expressly 

incorporated by reference. Nevertheless, the glyph code is still capable of effectively 

communicating machine readable digital information. To carry out this function, the glyphs 422 

and 423 usually are tilted to the left and right, at about +45° and -45° with respect to the 

longitudinal dimension of the recording medium 424 to encode binary "1 ? s" and "0's'\ 

respectively, as shown at 425. 

In a preferred embodiment, the encoded multilingual data represents a code C that 

describes a set of editing operations that can be applied to the primary information to convert it 

from a first (presentation) language into a second language. Assume that for each page of text P 

in a presentation language (e.g., English), there is an accurate translation ATL in each of a 

plurality of languages L, each with its own glyph description. Also assume a plurality of 

processing routines RL (perhaps one for each language) that can be applied to P to produce a 
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translation of P into language L. The quality of this translation RL(P) may be anywhere on the j 

continuum from very good to very bad. In any event, it is assumed that code C describes a set of 

editing functions E necessary to convert RL(P) into ATL. In the case where RL(P) closely 

approximates ATL, C will describe very minor, if any, editing functions. When RL(P) is very 

bad, C will describe more significant editing functions to apply to RL(P), making it identical to 

ATL. In other words, we compute a C such that ATL=E(C, RL(P)). Assuming that E and RL i 

exist in multilingual encoding/decoding modules 1 10 and 304 with an OCR engine, we need 

merely transmit C in glyphs on the page containing the human-readable text P. Multilingual 

encoding/decoding modules 110 and 304 would then reconstruct ATL by OCR'ing P, applying 

RL to the result, and then correcting according to instructions in C. A method for reading and 

decoding a channel is described in commonly assigned, copending U.S. Patent Application 

entitled, Multilingual encoding/decoding Coding with Character Classifications (Serial No. 

09/AAA 5 AAA), the contents of which are hereby expressly incorporated by reference. 

At one extreme, the secondary information could simply represent a compressed version 

of the primary text. That is, the system could take the translation, compress it according to a 

predetermined compression scheme, and store the compressed byte sequences in glyphs. The 

multilingual encoding/decoding modules 110 and 304 that produce the translation would simply j 

retrieve the appropriate decompression algorithm and apply it to the compressed byte sequences. 

In this case, RL is the null computation in the equation, ATL=E(C, RL(P)), and C is the output 

of the compression routine and E is the decompressor. A dictionary-based compaction scheme 

similar to the system disclosed in commonly assigned U.S. Patent No. 5,787,386 to Kaplan et al., 

is one such method for encoding the translation data. 
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At the other extreme, it could be assumed that a Machine Translation (MT) capability 
exists in the multilingual decoding module 304 such that a translation is produced simply by 
OCR'ing the (English) primary information and converting it into a desired translation using the 
MT capability residing on the multilingual decoding module 304. In this case, 
RL(P)=MT(P)=ATL, such that C and E will be empty. Given that the currently available MT 
systems (Systran, Logos, etc.) are not good enough to produce ATL without some level of post- 
processing, it is more reasonable to assume that after applying the MT to P, C and E would still 
perform some amount of processing to improve P's readability. Very generally, the correction 
code C would contain the operations that the post-editor E performs to produce the ATL. For 
example, suppose that RL is an MT system that does an adequate job when the meaning of the 
source sentence is clear, but in the absence of world-knowledge is unable to resolve the 
numerous ambiguities typically found in natural language text (e.g., Is "bank" a financial 
institution or a steep natural incline?) This type of disambiguation can be performed by framing 
a series of questions to a person fluent with the source language, and the answers are then used 
by the translation software to make the correct choices of word sense and sentence patterns in the 
target language. The correction code C would record the answers to these questions so that the 
fluent person's knowledge is effectively available for guidance when the multilingual decoding 
module 304 is requested to make the translation. 

Now suppose that RL has a word-for-word translation dictionary that lists all senses that 
an English word can have in a translation. In other words, RL produces a word-for-word 
representation of all these senses. The correction code C would then indicate for each word 
which sense is appropriate in the context (e.g., gives a sense number). For senses that are not 
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available, C would contain the actual spelling of the correct word for the ATL. Code C may also 
provide a permutation vector that tells how to order the words in the translation (e.g., what to 
insert, what to delete, etc.) Morphology may be included to minimize the coding. 

In addition to providing a correction code C, it is imagined that the secondary 
information may also encode information that makes the process of translating the primary 
information easier or more accurate. For example, the secondary information might describe 
exactly what encoding scheme, compression algorithm, MT, etc. was used, the settings that were 
used (e.g., font identifier, error correction data, codes for characters, etc.) what datasets must be 
available in the multilingual encoding/decoding modules 1 10 and 304, and hints that might ease 
the burden of translating the primary information (e.g., author, dialect, source date for old 
documents, supplementary dictionary for specialized words, acronyms, etc.) 

Fig. 5 is a block diagram of a user interface that may be used to extract one of a plurality 
of foreign language translations from a source document. In operation, the system captures a 
user-selected portion of a substrate 66, decodes the data found on the substrate 106, and performs 
further processing of the electronic document 102, as described herein. In the alternative, a user 
may simply select one of a plurality of languages from a list of languages on a display 104 or the 
user may select one of a plurality of buttons (each associated with a different language) on a 
control panel located on a printer or scanner device. In one embodiment, computer system 100 is 
a general purpose computer system, such as a conventional personal computer or laptop 
computer, that includes main memory 96, read only memory (ROM) 98, storage device 80, 
processor 92, and communication interface 84, all interconnected by bus 86. Bus 86 also 
connects to display 104, cursor control 94, and frame capture 88. 
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An image capture device, which in this case is a camera pen 70 or camera mouse 76, are 
connected to frame capture 88. Either camera pen 70 or camera mouse 76 may be used, but 
camera pen 70 will be used for purposes of discussion. Camera pen 70 transmits image 
information to frame capture 88. In one embodiment, camera pen 70 also transmits button 
depression signals to frame capture 88 and cursor control 94. The signals cause frame capture 
88, cursor control 94, and processor 92 to process the images and button depression signals. 

The user makes a selection by placing camera pen 70 on or near visual indicia on 
substrate 106, and pressing one or more buttons on the device. For example, pressing button 74 
causes camera pen 70 to capture the portion of the substrate 66 under the tip of camera pen 70, 
and transmit the image to computer 100, via frame capture 88, for analysis. The image may be a 
portion of substrate 106 that directs processor 92 to retrieve a specific portion of decompressed 
multilingual channel 319 (e.g., one of a plurality of foreign language translations) and to 
properly interpret primary channel 1 12 in view of multilingual channel 3 19 to translate it into 
human-readable text in a second language. In one embodiment, processor 92 executes programs 
which analyze captured portions of substrate 106 to determine address information of a selected 
foreign language. The address information is then passed to decoding/decompression module 
3 12 for further processing. The depression of one or more buttons can be used for additional 
signaling, as in a double click, hold down. 

In one embodiment, main memory 96 is a random access memoiry (RAM) or a dynamic 
storage device that stores instructions executed by processor 92 and document image or coded 
data. Main memory 96 may also store information used in executing instructions. ROM 98 is 
used for storing static information and instructions used by processor 92. Storage device 80, 
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; such as a magnetic or optical disk, also stores instructions and data used in the operation of 

|j computer system 100. 

l! Display 104 may be a CRT or other type of display device. Cursor control 94 controls 

il cursor movement on display 104. Cursor control 94 may accept signals from any type of camera 

II 

5 l! mouse 76 input device such as a trackball, or cursor direction keys. 

I- 

jj The apparatus and methods described herein may be implemented by computer system 

■ i 

j t 

!j 100 using hardware, software, or a combination of hardware and software. For example, the 

i 

i apparatus and methods described herein may be implemented as a program in any one or more of 
;j main memory 96, ROM 98, or storage device 80. The apparatus and methods described herein 
10 ; may alternatively be implemented by multilingual decoding module 304 (in the event 
j! multilingual decoding module 304 is separate from terminal 100). 
'! Such programs may be read into main memory 96 from another computer-readable 

!! medium, such as storage device 80. Execution of sequences of instructions contained in main 
memory 96 causes processor 92 to perform the process steps consistent with the present 
15 : ; invention described herein. Execution of sequences of instructions contained in main memory 
jj 96 also causes processor to implement apparatus elements that perform the process steps. Hard- 
j| wired circuitry may be used in place of or in combination with software instructions to 

!! implement the invention. Thus, embodiments of the invention are not limited to any specific 

j! 

jj combination of hardware circuitry and software. 
20 | FIG. 6 illustrates a flow diagram of the general steps for converting a hardcopy of a 

multilingual document into a translated version of the hardcopy document into a second human- 
F & R ^unnerT L E p T ' :! rea dable language. As shown in FIG. 6, the process begins in step 510 when a multilingual 
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document in accordance with a preferred embodiment of the present invention is inputted into a 

multilingual decoding module 304. While FIG. 3 shows the multilingual document being input 

into multilingual decoding module 304 via a scanner 308, it is well known that a plurality of 

methods and devices may be used to accomplish the data input. More specifically, the data may 

be input from a stored memory location, via a network connection, facsimile (not shown), etc. In 

step 520, a user accessing a user interface, selects the destination language and in step 525, the 

proper segment of the machine-readable code is translated. Next, in step 530, the decoded data 

is transmitted to terminal 100 for display on display 104. As shown in step 540, the displayed 

data may then be optionally transmitted to an attached printer (not shown) for rendering on a 

physical medium (e.g., paper). 

The term "computer-readable medium 1 ' as used herein refers to any medium that 

participates in providing instructions to processor 92 for execution. Such a medium may take 

many forms, including but not limited to, non-volatile memory media, volatile memory media, 

and transmission media. Non-volatile memory media includes, for example, optical or magnetic 

disks, such as storage device 80. Volatile memory media includes RAM, such as main memory 

96. Transmission media includes coaxial cables, copper wire and fiber optics, including the 

wires that comprise bus 86. Transmission media can also take the form, of acoustic or light 

waves, such as those generated during radio wave and infrared data communications. 

Common forms of computer-readable media include, for example, a floppy disk, a 

flexible disk, hard disk, magnetic tape, or any other magnetic storage medium, a CD-ROM, any 

other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, 
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a RAM, a PROM, an EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier 
wave as described hereinafter, or any other medium from which a computer can read and use. 

Various forms of computer readable media may be involved in carrying one or more 
sequences of instructions to processor 92 for execution. For example, the instructions may 
initially be carried on a magnetic disk or a remote computer. The remote computer can load the 
instructions into its dynamic memory and send the instructions over a telephone line using a 
modem. A modem local to computer system 100 can receive the data on the telephone line and 
use an infrared transmitter to convert the data to an infrared signal. An infrared detector coupled 
to appropriate circuitry can receive the data carried in the infrared signal and place the data on 
bus 86. Bus 86 carries the data to main memory 96, from which processor 92 retrieves and 
executes the instructions. The instructions received by main memory 96 may optionally be 
stored on storage device 80 either before or after execution by processor 92. 

Computer system 100 also includes a communication interface 84 coupled to bus 86. 
Communication interface 84 provides two way communications to other systems. For example, 
communication interface 84 may be an integrated services digital network (ISDN) card or a 
modem to provide a data communication connection to a corresponding type of telephone line. 
Communication may also be, for example, a local area network (LAN) card to provide 
communication to a LAN. Communication interface 84 may also be a wireless card for 
implementing wireless communication between computer system 100 and wireless systems. In 
any such implementation, communication interface 84 sends and receives electrical, 
electromagnetic or optical signals that carry data streams representing various types of 
information. 
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The link between communication interface 84 and external devices and systems typically 
provides data communication through one or more networks or other devices. For example, the 
link may provide a connection to a local network (not shown) to a host computer or to data 
equipment operated by an Internet Service Provider (ISP). An ISP provides data communication 
services through the world wide packet data communications network now commonly referred to 
as the Internet. Local networks and the Internet both use electrical, electromagnetic or optical 
signals that carry digital data streams. The signals through the various networks and the signals 
between the networks and communication interface 84, which carry the digital data to and from 
computer system 100, are exemplary forms of carrier waves transporting the information. 

Computer system 100 can send messages and receive data, including program code, 
through the network(s) via the link between communication interface 84 and the external 
systems and devices. In the Internet, for example, a server might transmit a requested code for 
an application program through the Internet, an ISP, a local network, and communication 
interface 84. 

Program code received over the network may be executed by processor 92 as it is 
received, and/or stored in memory, such as in storage device 80, for later execution. In this 
manner, computer system 100 may obtain application code in the form of a carrier wave. 
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